334 PART 6 Analyzing Survival Data

software estimates the coefficients of the predictor variables that make the pre-

dicted survival curves agree as much as possible with the observed survival times

of each participant.

How does PH regression determine these regression coefficients? The short

answer is, “You’ll be sorry you asked!” The longer answer is that, like all other

kinds of regression, PH regression is based on maximum likelihood estimation.

The software uses the data to build a long, complicated expression for the proba-

bility of one particular individual in the data dying at any point in time. This

expression involves that individual’s predictor values and the regression coeffi-

cients. Next, the software constructs a longer expression that includes the likeli-

hood of getting exactly the observed survival times for all the participants in the

data set. And if this isn’t already complicated enough, the expression has to deal

with the issue of censored data. At this point, the software seeks to find the values

of the regression coefficients that maximize this very long likelihood expression

(similar to the way maximum likelihood is described with logistic regression in

Chapter 18).

Hazard ratios

Hazard ratios (HRs) are the estimates of relative risk obtained from PH regression.

HRs in survival regression play a similar role that odds ratios play in logistic

regression. They’re also calculated the same way from regression output — by

exponentiating the regression coefficients:»

» In logistic regression: Odds ratio e

Coefflclent

Regression»

» In PH regression: Hazard ratio e

Coefflclent

Regression

Keep in mind that hazard is the chance of dying in any small period of time. For

each predictor variable in a PH regression model, a coefficient is produced that —

when exponentiated — equals the HR. The HR tells you how much the hazard rate

increases for the participants positive for the predictor compared to the compari-

son group when you increase the variable’s value by exactly 1.0 unit. Therefore, a

HR’s numerical value depends on the units in which the variable is expressed in

your data. And for categorical predictors, interpreting the HR depends on how you

code the categories.

For example, if a survival regression model in a study of emphysema patients

includes number of cigarettes smoked per day as a predictor of survival, and if the

HR for this variable comes out equal to 1.05, then a participant’s chances of dying

at any instant increase by a factor of 1.05 (5 percent) for every additional cigarette

smoked per day. A 5 percent increase may not seem like much, but it’s applied for

every additional cigarette per day. A person who smokes one pack (20 cigarettes)